Compression By Induction of Hierarchical Grammars
نویسندگان
چکیده
Adaptive compression methods build models of symbol sequences. In many areas of computer science, models of sequences are constructed for their explanatory value. In contrast, data compression schemes use models that are opaque in that they do not provide descriptions of the sequence that can be understood or applied in other domains. Statistical methods that compress text well invariably generate large models that are not so much a structural description of the sequence as a record of frequencies of short substrings. Macro models replace repeated text by references to earlier occurrences and generally work within a small moving window of symbols so that any implicit model is transient. In both cases the model is flat and does not build up abstractions by combining references into higher level phrases.
منابع مشابه
(0, 1)-Matrix-Vector Products via Compression by Induction of Hierarchical Grammars
We demonstrate a method for reducing the number of arithmetic operations within a (0, 1)matrix vector product. We employ an algorithm, SEQUITUR, developed for lossless text compression, which generates a context free grammar derived from an inherent hierarchy of repeated sequences. In this context, the sequences are composed of bit patterns for a set of adjacent columns. This grammar will repre...
متن کاملUnsupervised Grammar Induction in a Framework of Information Compression by Multiple Alignment, Unification and Search
This paper describes a novel approach to grammar induction that has been developed within a framework designed to integrate learning with other aspects of computing, AI, mathematics and logic. This framework, called information compression by multiple alignment, unification and search (ICMAUS), is founded on principles of Minimum Length Encoding pioneered by Solomonoff and others. Most of the p...
متن کاملCompression and Explanation Using Hierarchical Grammars
This paper describes an algorithm, called SEQUITUR, that identifies hierarchical structure in sequences of discrete symbols and uses that information for compression. On many practical sequences it performs well at both compression and structural inference, producing comprehensible descriptions of sequence structure in the form of grammar rules. The algorithm can be stated concisely in the form...
متن کاملConciseness of Associative Language Descriptions
Associative Language Descriptions are a recent grammar model, theoretically less powerful than Context Free grammars, but adequate for describing the syntax of programming languages. ALD do not use nonterminal symbols, but rely on permissible contexts for specifying valid syntax trees. In order to assess ALD adequacy, we analyze the descriptional complexity of structurally equivalent CF and ALD...
متن کاملAssociative Definition of Programming Languages1
Associative Language Descriptions are a recent grammar model, theoretically less powerful than Context Free grammars, but adequate for describing the syntax of programming languages. ALD do not use nonterminal symbols, but rely on permissible contexts for specifying valid syntax trees. In order to assess ALD adequacy, we analyze the descriptional complexity of structurally equivalent CF and ALD...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994